Contributions of Jitter and Shimmer in the Voice for Fake Audio Detection

نویسندگان

چکیده

Fake audio detection (FAD) aims to identify fraudulent speech generated through advanced speech-synthesis techniques. Most current FAD methods rely solely on a deep neural network (DNN) framework with either waveforms or commonly used acoustic features extract high-level representations, overlooking the analysis of prosody differences between genuine and fake speech. Prosody carries important cues about naturalness emotional content, which can be leveraged in audio. This paper explicitly investigates information represented by jitter shimmer features. On basis our investigation, we found strong evidence that obvious exist level real speech, particularly feature has large dynamic variation for To ensure accurate estimation F 0 better propose using two additional methods, YIN SWIPE, place IRAPT algorithm extraction process. Moreover, design DNN-FAD system combining Mel-spectrogram The effectiveness proposed method is evaluated datasets Audio Deep Synthesis Detection (ADD) 2022 2023 challenges. experimental results show both static continuous features, especially extracted SWIPE algorithms, provide complementary knowledge traditional spectrum-based systems. optimal effectively reduce equal error rate from 41.29 % 35.77 ADD2023 challenge, achieving relative improvement 13.37 %.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Although jitter, shimmer, and turbulent noise characterize all voice signals, their perceptual importance has not been established psychoacoustically. To determine which of these acoustic attributes is important in listeners’ perceptions of pathologic voices, listeners used a speech synthesizer to adjust levels of jitter, shimmer, and noise so that synthetic voices matched natural pathological ...

متن کامل

Jitter and shimmer measurements for speaker recognition

Jitter and shimmer are measures of the cycle-to-cycle variations of fundamental frequency and amplitude, respectively, which have been largely used for the description of pathological voice quality. Since they characterise some aspects concerning particular voices, it is a priori expected to find differences in the values of jitter and shimmer among speakers. In this paper, several types of jit...

متن کامل

Using Jitter and Shimmer in speaker verification

Jitter and shimmer are measures of the fundamental frequency and amplitude cycle-to-cycle variations, respectively. Both features have been largely used for the description of pathological voices, and since they characterise some aspects concerning particular voices, they are expected to have a certain degree of speaker specificity. In the current work, jitter and shimmer are successfully used ...

متن کامل

the search for the self in becketts theatre: waiting for godot and endgame

this thesis is based upon the works of samuel beckett. one of the greatest writers of contemporary literature. here, i have tried to focus on one of the main themes in becketts works: the search for the real "me" or the real self, which is not only a problem to be solved for beckett man but also for each of us. i have tried to show becketts techniques in approaching this unattainable goal, base...

15 صفحه اول

study of cohesive devices in the textbook of english for the students of apsychology by rastegarpour

this study investigates the cohesive devices used in the textbook of english for the students of psychology. the research questions and hypotheses in the present study are based on what frequency and distribution of grammatical and lexical cohesive devices are. then, to answer the questions all grammatical and lexical cohesive devices in reading comprehension passages from 6 units of 21units th...

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3301616